17 research outputs found
DIANet: Dense-and-Implicit Attention Network
Attention networks have successfully boosted the performance in various
vision problems. Previous works lay emphasis on designing a new attention
module and individually plug them into the networks. Our paper proposes a
novel-and-simple framework that shares an attention module throughout different
network layers to encourage the integration of layer-wise information and this
parameter-sharing module is referred as Dense-and-Implicit-Attention (DIA)
unit. Many choices of modules can be used in the DIA unit. Since Long Short
Term Memory (LSTM) has a capacity of capturing long-distance dependency, we
focus on the case when the DIA unit is the modified LSTM (refer as DIA-LSTM).
Experiments on benchmark datasets show that the DIA-LSTM unit is capable of
emphasizing layer-wise feature interrelation and leads to significant
improvement of image classification accuracy. We further empirically show that
the DIA-LSTM has a strong regularization ability on stabilizing the training of
deep networks by the experiments with the removal of skip connections or Batch
Normalization in the whole residual network. The code is released at
https://github.com/gbup-group/DIANet
Finite Expression Method for Solving High-Dimensional Partial Differential Equations
Designing efficient and accurate numerical solvers for high-dimensional
partial differential equations (PDEs) remains a challenging and important topic
in computational science and engineering, mainly due to the "curse of
dimensionality" in designing numerical schemes that scale in dimension. This
paper introduces a new methodology that seeks an approximate PDE solution in
the space of functions with finitely many analytic expressions and, hence, this
methodology is named the finite expression method (FEX). It is proved in
approximation theory that FEX can avoid the curse of dimensionality. As a proof
of concept, a deep reinforcement learning method is proposed to implement FEX
for various high-dimensional PDEs in different dimensions, achieving high and
even machine accuracy with a memory complexity polynomial in dimension and an
amenable time complexity. An approximate solution with finite analytic
expressions also provides interpretable insights into the ground truth PDE
solution, which can further help to advance the understanding of physical
systems and design postprocessing techniques for a refined solution
Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise
Batch Normalization (BN)(Ioffe and Szegedy 2015) normalizes the features of
an input image via statistics of a batch of images and hence BN will bring the
noise to the gradient of the training loss. Previous works indicate that the
noise is important for the optimization and generalization of deep neural
networks, but too much noise will harm the performance of networks. In our
paper, we offer a new point of view that self-attention mechanism can help to
regulate the noise by enhancing instance-specific information to obtain a
better regularization effect. Therefore, we propose an attention-based BN
called Instance Enhancement Batch Normalization (IEBN) that recalibrates the
information of each channel by a simple linear transformation. IEBN has a good
capacity of regulating noise and stabilizing network training to improve
generalization even in the presence of two kinds of noise attacks during
training. Finally, IEBN outperforms BN with only a light parameter increment in
image classification tasks for different network structures and benchmark
datasets
On Fast Simulation of Dynamical System with Neural Vector Enhanced Numerical Solver
The large-scale simulation of dynamical systems is critical in numerous
scientific and engineering disciplines. However, traditional numerical solvers
are limited by the choice of step sizes when estimating integration, resulting
in a trade-off between accuracy and computational efficiency. To address this
challenge, we introduce a deep learning-based corrector called Neural Vector
(NeurVec), which can compensate for integration errors and enable larger time
step sizes in simulations. Our extensive experiments on a variety of complex
dynamical system benchmarks demonstrate that NeurVec exhibits remarkable
generalization capability on a continuous phase space, even when trained using
limited and discrete data. NeurVec significantly accelerates traditional
solvers, achieving speeds tens to hundreds of times faster while maintaining
high levels of accuracy and stability. Moreover, NeurVec's simple-yet-effective
design, combined with its ease of implementation, has the potential to
establish a new paradigm for fast-solving differential equations based on deep
learning.Comment: Accepted by Scientific Repor
Optimizing Shot Assignment in Variational Quantum Eigensolver Measurement
The rapid progress in quantum computing has opened up new possibilities for
tackling complex scientific problems. Variational quantum eigensolver (VQE)
holds the potential to solve quantum chemistry problems and achieve quantum
advantages. However, the measurement step within the VQE framework presents
challenges. It can introduce noise and errors while estimating the objective
function with a limited measurement budget. Such error can slow down or prevent
the convergence of VQE. To reduce measurement error, many repeated measurements
are needed to average out the noise in the objective function. By consolidating
Hamiltonian terms into cliques, simultaneous measurements can be performed,
reducing the overall measurement shot count. However, limited prior knowledge
of each clique, such as noise level of measurement, poses a challenge. This
work introduces two shot assignment strategies based on estimating the standard
deviation of measurements to improve the convergence of VQE and reduce the
required number of shots. These strategies specifically target two distinct
scenarios: overallocated and underallocated shots. The efficacy of the
optimized shot assignment strategy is demonstrated through numerical
experiments conducted on a H molecule. This research contributes to the
advancement of VQE as a practical tool for solving quantum chemistry problems,
paving the way for future applications in complex scientific simulations on
quantum computers
Solving PDEs on Unknown Manifolds with Machine Learning
This paper proposes a mesh-free computational framework and machine learning
theory for solving elliptic PDEs on unknown manifolds, identified with point
clouds, based on diffusion maps (DM) and deep learning. The PDE solver is
formulated as a supervised learning task to solve a least-squares regression
problem that imposes an algebraic equation approximating a PDE (and boundary
conditions if applicable). This algebraic equation involves a graph-Laplacian
type matrix obtained via DM asymptotic expansion, which is a consistent
estimator of second-order elliptic differential operators. The resulting
numerical method is to solve a highly non-convex empirical risk minimization
problem subjected to a solution from a hypothesis space of neural-network type
functions. In a well-posed elliptic PDE setting, when the hypothesis space
consists of feedforward neural networks with either infinite width or depth, we
show that the global minimizer of the empirical loss function is a consistent
solution in the limit of large training data. When the hypothesis space is a
two-layer neural network, we show that for a sufficiently large width, the
gradient descent method can identify a global minimizer of the empirical loss
function. Supporting numerical examples demonstrate the convergence of the
solutions and the effectiveness of the proposed solver in avoiding numerical
issues that hampers the traditional approach when a large data set becomes
available, e.g., large matrix inversion
Probing reaction channels via reinforcement learning
We propose a reinforcement learning based method to identify important
configurations that connect reactant and product states along chemical reaction
paths. By shooting multiple trajectories from these configurations, we can
generate an ensemble of configurations that concentrate on the transition path
ensemble. This configuration ensemble can be effectively employed in a neural
network-based partial differential equation solver to obtain an approximation
solution of a restricted Backward Kolmogorov equation, even when the dimension
of the problem is very high. The resulting solution, known as the committor
function, encodes mechanistic information for the reaction and can in turn be
used to evaluate reaction rates